This RMD covers analysis of variance using discrete wavelet transforms. I’ll pull mostly from Gencay et al (2001) and the waveslim package. The ultimate goal here is to categorize our inflation data according to fine vs. coarse variance–that is, items whose ‘energy’ occurs mainly at high frequencies (say, a few months) versus lower frequencies (say, a couple years)– and price-quantity correlation. Hopefully, this will give us something like the categories that Gardiner Means produced as administered prices versus market prices.
In the sections that follow, I will…
I’ll finish with some preliminary analysis of the relationship between administered prices vs. market prices and the pandemic inflation.
The following code (not shown in markdown) will import and format the BEA data from csv tables.
The following function (not shown in markdown) allows querying the main tables (exp, qua, and pri) based on item levels, where goods and services are level 1; durable goods, nondurable goods, and HH cons exp on services are level 2; and so on. Often, we’ll want the lowest (i.e. most granular level), which can be retrieved with the lowestLevel = T parameter. This is the same as in Process PCE Data.Rmd
This is an example of a plot from the data–specifically, quantities.
This will plot price data on a single plot. It’s set up to run all items, but commented out.
I looked through each of these and a few things stood out:
This will plot price and quantity data on a single plot. It’s set up to run all items, but commented out. I’m going to use it to check out the issue of duplicate price series mentioned above.
Here’s the list of items sharing the same or very similar price data as the item(s) around them.
## [1] "1 New domestic autos"
## [1] "2 New foreign autos"
## [1] "32 Bicycles and accessories"
## [1] "33 Pleasure boats"
## [1] "34 Pleasure aircraft"
## [1] "35 Other recreational vehicles"
## [1] "95 Government employees' expenditures abroad"
## [1] "96 Private employees' expenditures abroad"
## [1] "98 Tenant-occupied mobile homes"
## [1] "99 Tenant-occupied stationary homes and landlord durables"
## [1] "100 Owner-occupied mobile homes"
## [1] "101 Owner-occupied stationary homes"
## [1] "103 Group housing (23)"
## [1] "112 Specialty outpatient care facilities and health and allied services"
## [1] "113 All other professional medical services"
## [1] "114 Nonprofit hospitals' services to households"
## [1] "115 Proprietary hospitals"
## [1] "116 Government hospitals"
## [1] "117 Nonprofit nursing homes' services to households"
## [1] "118 Proprietary and government nursing homes"
## [1] "120 Auto leasing"
## [1] "121 Truck leasing"
## [1] "126 Taxicabs and ride sharing services"
## [1] "127 Intracity mass transit"
## [1] "142 Casino gambling"
## [1] "143 Lotteries"
## [1] "144 Pari-mutuel net receipts"
## [1] "148 Elementary and secondary school lunches"
## [1] "149 Higher education school lunches"
## [1] "151 Meals at other eating places"
## [1] "152 Meals at drinking places"
## [1] "154 Food supplied to civilians"
## [1] "155 Food supplied to military"
## [1] "170 Household insurance premiums and premium supplements"
## [1] "171 Less: Household insurance normal losses"
## [1] "182 Proprietary and public higher education"
## [1] "183 Nonprofit private higher education services to households"
## [1] "197 Clothing repair, rental, and alterations"
## [1] "198 Repair and hire of footwear"
## [1] "211 Repair of furniture, furnishings, and floor coverings"
## [1] "212 Repair of household appliances"
## [1] "215 U.S. travel outside the United States"
## [1] "216 U.S. student expenditures"
Note, the housing price numbers aren’t identical, but they’re very similar, so I included them here. Same with household insurance. Same with US travel outside US and US student expenditures. Same with new foreign and domestic autos.
Energy is defined as the sum of squared values of a vector. Energy is proportional to variance, and the discrete wavelet transform is energy (variance) preserving. Hence the sum of squared values of a time series (x) equals the sum of the sum of squared wavelet detail coefficients (d) across all scales (1 though J), including the smooth (s). Gencay et al. (2001, 125) write this as
\(||x||^{2} = \sum_{j = 1}^{J}||d_j||^2 + ||s_J||^2\)
Where \(||.||^2\) is just the sum of the squared values in the vector. Hence the sum of the squared values of the original series equals the sum of the squared coefficients of all of the wavelet scales, including the smooth.
Using data from IBM’s stock returns in the 1960s, Gencay et al. (2001, 127-8) plot the wavelet energies “normalized by \(N^{-1}\),” which is to say they take the sum of squared coefficients then divide by length (i.e. number of coefficients) for each scale. Dividing by number of coefficients is necessary with a DWT because, by definition, each higher (coarser) scale will have half as many observations as the (finer) scale below it, such that finer scales will typically have higher sums of squared coefficients simply for having much larger numbers of coefficients.
In terms of choosing the wavelet, the authors also note that as “the length of the wavelet filter increases, the approximation to an ideal band-pass filter improves and therefore the wavelet filter will better capture the variability in the frequency intervals associated with the DWT wavelet coefficients.” Hence, below I will use the LA(8) wavelet, not the Haar (which has length 2).
So the idea here is to see what frequencies have the most energy–which is similar to asking whether the time series has a lot of long-period versus short-period variance. To demonstrate, the following will decompose the energy of the DWT for a single item. Note, our data has around 745 months. We may move on to MODWTs later, which don’t care about this, but standard DWTs need dyadic lengths (\(2^2 = 4, 2^3 = 8, 16, 32, 64, 125, 256,\) 256, &c.). However, partial DWTs (see Gencay et al. 2001, 124) should work just as well, and since we really don’t care about frequencies beyond a business cycle (which we’ll call 128 months at most), then we really just need a sample size divisible by 128. This means 640 should work for us, so the following starts at the most recent month available (currently, Nov. 2021) and grabs everything 641 months back (to 1968), making 640 observations after taking the difference.
*Note, we may want to consider cutting this down to to start at Jan. of 1983 for two reasons: first, Gyun Gu has shown that there was a significant change in corproate pricing, a trend toward much greater stability, at around 1983; and second, data for net transactions and used truck margins for used light trucks start in Jan of 1983. Because the used auto market has been such a big part of the pandemic inflation (and presumably used light trucks are, too, though I haven’t confirmed that), it may be advisable to include these details. Note, also, that other important items don’t start ’til later as well, including video and audio streaming and rental (1982) and and software (1977).
To start at 1983 would be 467 months up to Nov. 2021. But that’s not ideal because the closest we could get with a partial DWT would be 384 months, running from late 1989 to the present. We could go down to a J = 6 (64 months) partial DWT, which would allow for 448 months; or we could strictly use MODWTs. But, having finished this Rmd at this point, I’d recommend using the shorter period (384 months) for the energy analysis below, then the longer period (1983 and on) for the price-quantity correlation, since it relies on MODWT.
The following code will produce the breakdown of energies by scale for the price data of eggs as an example.
You can see that eggs have a fairly high degree of energy (the average scale energy, excluding the smooth, across all scales and items came out to about 0.00016, partly raised by extremely high energies for securities commissions). But you can also see that most of that energy is in scales 4 and 5, which correspond to 16-32 and 32-64 month periods–i.e. we’re looking mostly at price changes between about 1-5 years.
The following (not shown in markdown) will produce a table of all (least aggregated) items for this time period, including whether they’re durable goods, nondurable goods, or services, and their energies by scale (not including the smooth). The table is ordered by d1 energy (that is, the energy of the finest scale, which represents price changes over 2-4 months).
And here are some histograms (one for each scale) to show the distribution of items in terms of energy. (Note, the 3 items with very high energies, televisions, other video equipment, and securities commissions, are not included).
The following simply gives the items’ ranks in terms of energies at each scale (where rank 1 indicates lowest energy among all items at that scale). The table is ordered by the fourth scale, but it is not printed in the markdown.
This might not be much help, but for the purposes of finding a clear cut-off point between high-energy items (i.e. market items, presumably looking at short-periods–that is, the first scale or two) versus low-energy items (i.e. administered prices), here are histograms that sum the energies for the first two details and for the 3rd-5th details.
It seems like items with combined energies at the first two scales of less than 0.0004 might appropriately be called administered prices, with those greater being called market prices. But, we’ll want to look into the matter more before calling that the distinguishing characteristic.
Out of curiosity, I’m going to build a table of items and their 1st to 4th scale energies. My thinking here is that there might be groups of items with higher 2-4 months energy (the first scale) but low 16-32 month energies (the market prices), items with the opposite (the admin prices), and items with either high or low energies at both scales (the indeterminate). So I’m looking for a bimodal distribution here.
They seem to be more or less normally distributed around 1, that is, approximately the same energy at scale 1 as scale 4 (this is roughly true for scale 1 to scales 2, 3, and 5 as well). A ratio of 1.5 or 2 might make for a good cutoff (ratios above being market prices, below being admin prices), but it seems pretty arbitrary to me.
The ratio of the first detail energy to the fifth (32-64 months) is similar, so let’s just pick a ratio here of 2 as the cutoff.
This seems like it might be a workable approach as the 5th scale (representing about 4 years) (1) is long enough to clearly represent the sort of long-term price changes we’d expect to see from administered prices, (2) is substantially longer than one year, such that we aren’t picking up seasonal pricing, but (3) isn’t so long as to potentially pick up business cycle fluctuations.
After looking at the items that this approach would categorize as market, I don’t think this is a good approach. I’m guessing the issue is that some items have very low scale 5 energies.
As Gencay et al. (2001, 241) explain, for a series \(\textbf{x} = (x_0, x_1, ..., x_{N-1})\) of length N, a MODWT of order J produces wavelet coefficients \(\tilde{w}\). An unbiased estimator of the wavelet variance is given by:
\[\tilde{\sigma}^{2}_{x}(\lambda_j) = \frac{1}{\tilde{N_j}} \sum_{t=L_j-1}^{N-1} \tilde{w}^{2}_{j,t}\]
Where \(L_j = (2^j - 1)(L - 1) + 1\) is the length of scale \(\lambda_j\) wavelet filter and \(\tilde{N}_j = N - L_j + 1\) is the number of coefficients unaffected by the boundary. Confidence intervals can be estimated for the above in a variety of ways (see Gencay et al. 2001, 242-4).
Wavelet covariance and correlation–i.e. for each scale of the wavelet transform–can similarly be estimated for a bivariate time series–i.e., for our study, price and quantity for a given item. Likewise, lags and leads can be introduced to obtain wavelet cross-covariance and cross-correlation; however, because DWTs are not translation invariant, these require use of the MODWT (see Gencay et al. 2001, 252-3). The unbiased estimator for the wavelet covariance of a bivariate series \(X = ((x_{1,0}, x_{2,0}, (x_{1,1}, x_{2,1}),..., (x_{1,N-1}, x_{2,N-1})))\) and MODWT coefficients of the two series, \(\tilde{w}_1\) and \(\tilde{w}_2\), is given by:
\[\tilde{\gamma}_{X}(\lambda_j) = \frac{1}{\tilde{N_j}} \sum_{l=L_j-1}^{N-1} \tilde{w}_{1,j,l} \tilde{w}_{2,j,l}\]
(Gencay et al. 2001, 253). Confidence intervals can be estimated for wavelet covariance as in Gencay et al. (2001, 254-5).
Finally, wavelet correlation simply normalizes the wavelet covariance by the variance of the wavelet coefficients of the two series:
\[\rho_X(\lambda_j) = \frac{\gamma_X(\lambda_j)}{\sigma_1(\lambda_j)\sigma_2(\lambda_j)}\]
which, as usual, will take a value between 0 and 1. The biased estimator for this can be computed with the previous equations (Gencay et al. 2001, 258-9), and confidence intervals can be calculated as in Gencay et al. (2001, 259-60).
To demonstrate, below are, for three items, the original series (difference of logs) and MODWT coefficients (first plot: price; second: quantity) and cross correlation plots with red lines indicating 95% confidence intervals. The code for the correlation component of the following is taken directly from the waveslim documentation (for the function spin.covariance) and recreates Figure 7.9 in Gencay et al. (2001, 261) but using our price and quantity data.
First, for gasoline.
Note that price and quantity for gasoline are fairly correlated at coarser scales with some significant correlations at various lags and leads. For instance, there is a negative contemporaneous (zero lag) correlation at scale four, which can be interpreted as: the price of oil and the quantity sold tend to move in opposite directions over a period of 16-32 months. A similar pattern in scale 3 suggests that this might be related to seasonal patterns.
The correlations for gas are not as high as one might expect, though. A high price-quantity correlation that would more clearly indicate a market price is evident for fresh fruit:
In contrast, new domestic autos show some slight correlation at finer scales (perhaps due to semiannual sales? or negotiations at the dealership?), but otherwise no correlation.
Based on the above approach, the following code (not displayed in markdown) will build a table of price-quantity correlations for all items.
I’ll set the start date to Jan 1983.
Lastly, and just for good measure, I’ll calculate a couple basic metrics for the extent of inflation per item during the pandemic. The following gives the percentage change in price for each item between January 2020 and the most recent month (currently Nov. 2021). Also, I’ll calculate the % change in the average monthly price change from the 10 years prior to the pandemic and the average price change in the pandemic. Histograms are plotted, because who doesn’t like a histogram, though I’m not presently comparing it to other periods. Also shown are the highest 10 items.
## Item Pan.Price.Pct.Chg
## 207 Domestic services 11.62545
## 209 Repair of furniture, furnishings, and floor coverings 11.78749
## 210 Repair of household appliances 11.78844
## 82 Flowers, seeds, and potted plants 11.87176
## 208 Moving, storage, and freight services 12.01004
## 54 Eggs 12.13912
## 51 Fish and seafood 12.16087
## 1 New domestic autos 12.24893
## 2 New foreign autos 12.24954
## 33 Pleasure boats 12.32813
## 32 Bicycles and accessories 12.32824
## 34 Pleasure aircraft 12.32837
## 35 Other recreational vehicles 12.32857
## 97 Less: Personal remittances in kind to nonresidents 12.44174
## 11 Furniture 12.68718
## 160 Pension funds 12.69235
## 3 New light trucks 12.76665
## 150 Meals at limited service eating places 13.02493
## 92 Tobacco (127) 13.05817
## 95 Government employees' expenditures abroad 13.40133
## 96 Private employees' expenditures abroad 13.40145
## 50 Poultry 14.88862
## 165 Portfolio management and investment advice services 17.62670
## 163 Indirect commissions 19.17813
## 48 Pork 20.50268
## 15 Major household appliances 21.90615
## 66 Food produced and consumed on farms (6) 24.17428
## 75 Fuel oil 24.20154
## 47 Beef and veal 27.04562
## 107 Natural gas (28) 32.55022
## 73 Gasoline and other motor fuel 33.25851
## 76 Other fuels 33.90403
## 7 Net transactions in used trucks 47.58872
## 4 Net transactions in used autos 47.58888
## 6 Employee reimbursement 50.79947
## 122 Motor vehicle rental 50.80002
## 8 Used truck margin 79.83953
## 5 Used auto margin 79.84207
The items that have been making headlines are clear in the first metric, most notably the used car market. And within that the used auto margin (the difference between what dealers pay for used cars and what they charge). Used light trucks is the same. Meats are also showing up, especially beef; but so are major household appliances.
One point of interest is the negative side: government supplied food (school lunches, &c.) are registered with almost the exact same roughly 50% decline in prices. This is probably an anomaly of the survey method, but it’s worth asking: how much is this depressing the headline inflation number?
As for the second metric, You can see in the last plot (with the 45 degree line for parity) that points are clustered around the diagonal line, indicating that most items saw similar average monthly changes during the pandemic as they did in the decade prior, but also that the changes were usually modestly higher during than pre. But there are also outliers. I’ll look at those with ggplots below.
Lastly, we’ll build a table of all the items and whether they’re classified as market or admin prices according to the different methods developed above. These are (for market prices)…
In the table below for columns 2 and 3, 1 indicates market item, 0 indicates admin.
Following that in the table are columns for the highest contemporaneous price-quantity correlation and the scale of that correlation, the same for lagged correlations, then the percentage price change between Jan. 2020 and the most recent available month (Nov. 2021), and lastly the price-quantity beta coefficient and p-value for the simple regression. The table is ordered according to the price change.
First, I’ll check to see how the price-quantity correlation compares to the simple regression P-value
Interestingly, the two don’t match up as much as I would have expected. In fact, while 150 of the 192 items have p-values of less than 0.01 associated with the beta coefficient of the regression, only 45 have correlations greater than 0.8 in the wavelet scale with the highest correlation. Of course, the regression results are rudimentary, but this calls for further methodological study in any case.
The following compares the energy of the first two scales and the 1st to 5th scale ratio to the highest correlation.
Well, that’s not the best way to plot that, but for now it’s good enough to make me think the energy approach isn’t very useful.
All in all, I think the price-quantity approach is best suited for distinguishing market from administered prices, though energy could useful in related analyses. It’s worth noting, though, that to the extent that the low-scale (i.e. high frequency energies) don’t tell us much, this would suggest that the traditional approach of counting months with zero price change is probably also not a very sound method.
In fact, the wavelet method in general suggests a more suitable approach, as it allows for recognition of price changes at different frequencies, which itself is similar to the traditional approach, but with more detailed information about those frequencies. That is, whereas the traditional approach can only tell us how frequently there is no price change, the wavelet approach can tell us, e.g., how much or how little frequent price changes (say in the 2-4 month range) makes up the total of price volatility. It seems to me that this is at least as sound an approach as the traditional approach, plus it lends additional information. For instance, prices may change at a lower frequency reflecting, planning processes of the price administrators (though it must also be noted that price changes at higher frequency may also reflect those planning processes–e.g. temporary sales).
I want to do a quick check of the pandemic price change against price-quantity correlation. Here’s a plot, along with some zooms of the same:
[Note: the analysis below refers to the original correlation numbers, which were run on the longer dataset. These have changed now that we’re only selecting from 1983 on.]
The horizontal axis alone is kind of interesting: items with higher correlations tend to have those correlations at coarser scales (lower frequency). An argument could be made that the only proper market prices are those with high correlations at finer scales (i.e. the darker colored dots that are also toward the left or right extremes on the above plots), although as noted earlier, this could also reflect sales among administered price items.
More generally, nothing in the plots above jumps out at me as showing that the inflation is either the result of market prices alone, or administered prices generally. Rather, it looks like certain items have seen particularly high inflation (which is consistent with what many have been reporting), including especially used car dealerships, meat processors, and household appliance producers. This perspective does beg additional questions, though. For instance, the price rise for processed and fresh fruits has been about the same, despite the latter clearly be more market-priced and the former more administered; so what’s going on with meat?
And, of course, what’s going on with used cars. Plot 7. above shows most of them. Note that new domestic autos have some p-q correlation (foreign autos do not), and the used auto margin falls in between (although net transactions in used autos and used light trucks have greater correlations); yet the inflation for new autos has been on the high end of the cluster of most items, but nowhere near where the used autos are. That demands an explanation, and I have trouble saying it’s all chip shortages (but…maybe?).
Lastly, of course, this is just one way to look at the pandemic inflation. We’ll develop others (including the difference-in-difference approach), and I think compare those results simply to the correlations values we’re looking at above.
Here’s the plot using the other metric for the pandemic inflation (% change in average monthly price changes, pandemic vs. decade prior):
(Note: wine is way off the scale so it’s been removed from view)
And one more pass to look at the average monthly price change pre- versus during pandemic. Average monthly change in price in the decade prior to the pandemic is on the x axis, and during the pandemic is on the y axis. The color reflects the correlation between price and quantity at the finest (2-4 month) scale.
Here’s the same thing but with color reflecting highest correlation value.
Here’s the same thing but with color reflecting highest correlation value, only showing items with |highest correlation| > 0.5.
And the same again, but with correlations <= 0.5